{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## TAO remote client - Data-Services\n",
    "### The workflow in a nutshell\n",
    "TAO Data Services include 4 key pipelines:\n",
    "1. Offline data augmentation using DALI\n",
    "2. Auto labeling using TAO Mask Auto-labeler (MAL)\n",
    "3. Annotation conversion\n",
    "4. Groundtruth analytics\n",
    "\n",
    "## Learning Objectives\n",
    "\n",
    "In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:\n",
    "\n",
    "* Convert KITTI dataset to COCO format\n",
    "* Run auto-labeling to generate pseudo masks for KITTI bounding boxes\n",
    "* Apply data augmentation to the KITTI dataset with bounding boxe refinement\n",
    "* Run data analytics to collect useful statistics on the original and augmented KITTI dataset\n",
    "\n",
    "### Table of contents\n",
    "\n",
    "1. [Convert KITTI data to COCO format](#head-1)\n",
    "2. [Generate pseudo-masks with the auto-labeler](#head-2)\n",
    "3. [Apply data augmentation](#head-3)\n",
    "4. [Perform data analytics](#head-4)\n",
    "\n",
    "\n",
    "### Requirements\n",
    "Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import glob\n",
    "import subprocess\n",
    "import json\n",
    "import ast\n",
    "import time\n",
    "from IPython.display import clear_output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "namespace = 'default'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install TAO remote client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# # SKIP this step IF you have already installed the TAO-Client wheel.\n",
    "! pip3 install nvidia-transfer-learning-client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# # View the version of the TAO-Client\n",
    "! nvtl --version"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set the remote service base URL and Token"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### FIXME\n",
    "\n",
    "1. Assign the ip_address and port_number in FIXME 1 and FIXME 2 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))\n",
    "2. Assign the ngc_api_key variable in FIXME 3\n",
    "3. Assign path of DATA_DIR in FIXME 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Define the node_addr and port number\n",
    "workdir = \"workdir_data_services\" # FIXME1\n",
    "host_url = \"http://<ip_address>:<port_number>\" # FIXME2 example: https://10.137.149.22:32334\n",
    "# In host machine, node IP address and port number can be obtained as follows,\n",
    "# node_addr: hostname -I\n",
    "# node_port: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'\n",
    "ngc_api_key = \"<ngc_api_key>\"  # FIXME 3 example: (Add NGC API key)\n",
    "data_dir = \"<DATA_DIR>\" # FIXME4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%env BASE_URL={host_url}/{namespace}/api/v1\n",
    "\n",
    "# Exchange NGC_API_KEY for JWT\n",
    "identity = json.loads(subprocess.getoutput(f'nvtl login --ngc-api-key {ngc_api_key}'))\n",
    "\n",
    "%env USER={identity['user_id']}\n",
    "%env TOKEN={identity['token']}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating workdir\n",
    "workdir = os.path.abspath(workdir)\n",
    "if not os.path.isdir(workdir):\n",
    "    os.makedirs(workdir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Function to parse logs <a class=\"anchor\" id=\"head-1.1\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def my_tail(model_name_cli, id, job_id, job_type, workdir):\n",
    "\tstatus = None\n",
    "\twhile True:\n",
    "\t\ttime.sleep(10)\n",
    "\t\tclear_output(wait=True)\n",
    "\t\tlog_file_path = subprocess.getoutput(f\"nvtl {model_name_cli} get-log-file --id {id} --job {job_id} --job_type {job_type} --workdir {workdir}\")\n",
    "\t\tif not os.path.exists(log_file_path):\n",
    "\t\t\tcontinue\n",
    "\t\twith open(log_file_path, 'rb') as log_file:\n",
    "\t\t\tlog_contents = log_file.read()\n",
    "\t\tlog_content_lines = log_contents.decode(\"utf-8\").split(\"\\n\")        \n",
    "\t\tfor line in log_content_lines:\n",
    "\t\t\tprint(line.strip())\n",
    "\t\t\tif line.strip() == \"Error EOF\":\n",
    "\t\t\t\tstatus = \"Error\"\n",
    "\t\t\t\tbreak\n",
    "\t\t\telif line.strip() == \"Done EOF\":\n",
    "\t\t\t\tstatus = \"Done\"\n",
    "\t\t\t\tbreak\n",
    "\t\tif status is not None:\n",
    "\t\t\tbreak\n",
    "\treturn status"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Function to split tar files <a class=\"anchor\" id=\"head-1.1\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import tarfile\n",
    "\n",
    "def split_tar_file(input_tar_path, output_dir, max_split_size=0.2*1024*1024*1024):\n",
    "\tos.makedirs(output_dir, exist_ok=True)\n",
    "\t\n",
    "\twith tarfile.open(input_tar_path, 'r') as original_tar:\n",
    "\t\tmembers = original_tar.getmembers()\n",
    "\t\tcurrent_split_size = 0\n",
    "\t\tcurrent_split_number = 0\n",
    "\t\tcurrent_split_name = os.path.join(output_dir, f'smaller_file_{current_split_number}.tar')\n",
    "\t\t\n",
    "\t\twith tarfile.open(current_split_name, 'w') as split_tar:\n",
    "\t\t\tfor member in members:\n",
    "\t\t\t\tif current_split_size + member.size <= max_split_size:\n",
    "\t\t\t\t\tsplit_tar.addfile(member, original_tar.extractfile(member))\n",
    "\t\t\t\t\tcurrent_split_size += member.size\n",
    "\t\t\t\telse:\n",
    "\t\t\t\t\tsplit_tar.close()\n",
    "\t\t\t\t\tcurrent_split_number += 1\n",
    "\t\t\t\t\tcurrent_split_name = os.path.join(output_dir, f'smaller_file_{current_split_number}.tar')\n",
    "\t\t\t\t\tcurrent_split_size = 0\n",
    "\t\t\t\t\tsplit_tar = tarfile.open(current_split_name, 'w')  # Open a new split tar archive\n",
    "\t\t\t\t\tsplit_tar.addfile(member, original_tar.extractfile(member))\n",
    "\t\t\t\t\tcurrent_split_size += member.size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Convert KITTI data to COCO format <a class=\"anchor\" id=\"head-1\"></a>\n",
    "We would first convert the dataset from KITTI to COCO formats.\n",
    "\n",
    "### Define the task and action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_name = \"annotations\"\n",
    "action = \"convert\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create dataset\n",
    "We support both KITTI and COCO data formats\n",
    "\n",
    "KITTI dataset follow the directory structure displayed below:\n",
    "```\n",
    "$DATA_DIR/dataset\n",
    "├── images\n",
    "│   ├── image_name_1.jpg\n",
    "│   ├── image_name_2.jpg\n",
    "|   ├── ...\n",
    "└── labels\n",
    "    ├── image_name_1.txt\n",
    "    ├── image_name_2.txt\n",
    "    ├── ...\n",
    "```\n",
    "\n",
    "And COCO dataset follow the directory structure displayed below:\n",
    "```\n",
    "$DATA_DIR/dataset\n",
    "├── images\n",
    "│   ├── image_name_1.jpg\n",
    "│   ├── image_name_2.jpg\n",
    "|   ├── ...\n",
    "└── annotations.json\n",
    "```\n",
    "For this notebook, we will be using the KITTI object detection dataset for this example. To find more details, please visit [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Dataset Links\n",
    "images_url = \"https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip\"\n",
    "labels_url = \"https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the dataset\n",
    "!wget -O images.zip {images_url}\n",
    "!wget -O labels.zip {labels_url}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!unzip -q images.zip -d {data_dir}/\n",
    "!unzip -q labels.zip -d {data_dir}/\n",
    "!mkdir -p {data_dir}/images {data_dir}/labels\n",
    "!mv {data_dir}/training/image_2/000* {data_dir}/images/\n",
    "!mv {data_dir}/training/label_2/000* {data_dir}/labels/\n",
    "!cd {data_dir} && tar -cf kitti_dataset.tar images labels\n",
    "!rm -rf images.zip labels.zip {data_dir}/training/ {data_dir}/training/ {data_dir}/testing/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_path = f\"{data_dir}/kitti_dataset.tar\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create dataset\n",
    "kitti_dataset_id = subprocess.getoutput(f\"nvtl {model_name} dataset-create --dataset_type object_detection --dataset_format kitti\")\n",
    "print(kitti_dataset_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_dir = os.path.join(os.path.dirname(os.path.abspath(dataset_path)), model_name, \"kitti_to_coco\")\n",
    "split_tar_file(dataset_path, output_dir)\n",
    "for idx, tar_dataset_path in enumerate(os.listdir(output_dir)):\n",
    "    print(f\"Uploading {idx+1}/{len(os.listdir(output_dir))} tar split\")\n",
    "    upload_dataset_message = subprocess.getoutput(f\"nvtl {model_name} dataset-upload --id {kitti_dataset_id} --path {os.path.join(output_dir,tar_dataset_path)}\")\n",
    "    print(upload_dataset_message)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### List the created datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "message = subprocess.getoutput(f\"nvtl {model_name} list-datasets\")\n",
    "message = ast.literal_eval(message)\n",
    "for rsp in message:\n",
    "    rsp_keys = rsp.keys()\n",
    "    assert \"id\" in rsp_keys\n",
    "    assert \"type\" in rsp_keys\n",
    "    assert \"format\" in rsp_keys\n",
    "    assert \"name\" in rsp_keys\n",
    "    print(rsp[\"id\"],\"\\t\",rsp[\"type\"],\"\\t\",rsp[\"format\"],\"\\t\\t\",rsp[\"name\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an experiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an experiment\n",
    "annotation_conversion_experiment_id = subprocess.getoutput(f\"nvtl {model_name} experiment-create --network_arch {model_name} --encryption_key nvidia_tlt\")\n",
    "print(annotation_conversion_experiment_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assign dataset\n",
    "docker_env_vars = {} # Update any variables to be included while triggering Docker run-time like MLOPs variables \n",
    "dataset_information = {\"inference_dataset\":kitti_dataset_id,\"docker_env_vars\": docker_env_vars}\n",
    "patched_model = subprocess.getoutput(f\"nvtl {model_name} patch-artifact-metadata --id {annotation_conversion_experiment_id} --job_type experiment --update_info '{json.dumps(dataset_information)}' \")\n",
    "print(patched_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set action specs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Default model specs\n",
    "annotation_conversion_specs = subprocess.getoutput(f\"nvtl {model_name} get-spec --action {action} --job_type experiment --id {annotation_conversion_experiment_id}\")\n",
    "annotation_conversion_specs = json.loads(annotation_conversion_specs)\n",
    "print(json.dumps(annotation_conversion_specs, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set specs\n",
    "annotation_conversion_specs[\"data\"][\"input_format\"] = \"KITTI\"\n",
    "annotation_conversion_specs[\"data\"][\"output_format\"] = \"COCO\"\n",
    "print(json.dumps(annotation_conversion_specs, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Execute the data format conversion action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run action\n",
    "convert_job_id = subprocess.getoutput(f\"nvtl {model_name} experiment-run-action --id {annotation_conversion_experiment_id} --action {action} --specs '{json.dumps(annotation_conversion_specs)}'\")\n",
    "print(convert_job_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)\n",
    "status = my_tail(model_name, annotation_conversion_experiment_id, convert_job_id, \"experiment\", workdir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download the COCO annotations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file_list = subprocess.getoutput(f\"nvtl {model_name} list-job-files --id {annotation_conversion_experiment_id} --job {convert_job_id} --job_type experiment --retrieve_logs True --retrieve_specs False\")\n",
    "print(file_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "temptar = subprocess.getoutput(f\"nvtl {model_name} download-entire-job --id {annotation_conversion_experiment_id} --job {convert_job_id} --job_type experiment --workdir {workdir}\")\n",
    "tar_command = f'tar -xvf {temptar} -C {workdir}/'\n",
    "os.system(tar_command)\n",
    "os.remove(temptar)\n",
    "print(f\"Results at {workdir}/{convert_job_id}\")\n",
    "convert_out_path = f\"{workdir}/{convert_job_id}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copy annotations to the dataset\n",
    "!cp {convert_out_path}/{kitti_dataset_id}.json {data_dir}/annotations.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Generate pseudo-masks with the auto-labeler <a class=\"anchor\" id=\"head-2\"></a>\n",
    "Here we will use a pretrained MAL model to generate pseudo-masks for the converted KITTI data. \n",
    "\n",
    "### Define the task and action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = \"auto_label\"\n",
    "action = \"generate\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the dataset\n",
    "We would be formatting the original dataset to include the COCO annotations generated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Reformatting the dataset\n",
    "# Untar to destination\n",
    "tar_command = f'mkdir -p {workdir}/{convert_job_id}_coco/ && tar -xf {dataset_path} -C {workdir}/{convert_job_id}_coco/'\n",
    "os.system(tar_command)\n",
    "\n",
    "# Copy the annotations\n",
    "copy_command = f'cp {convert_out_path}/{kitti_dataset_id}.json {workdir}/{convert_job_id}_coco/annotations.json'\n",
    "os.system(copy_command)\n",
    "\n",
    "# Tar the dataset\n",
    "tar_command = f'cd {workdir} && tar -cf {convert_job_id}_coco.tar {convert_job_id}_coco'\n",
    "os.system(tar_command)\n",
    "coco_data_path = f'{workdir}/{convert_job_id}_coco.tar'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create Dataset\n",
    "coco_dataset_id = subprocess.getoutput(f\"nvtl {model_name} dataset-create --dataset_type object_detection --dataset_format coco\")\n",
    "print(coco_dataset_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_dir = os.path.join(os.path.dirname(os.path.abspath(coco_data_path)), model_name, \"coco_pseudo\")\n",
    "split_tar_file(coco_data_path, output_dir)\n",
    "for idx, tar_dataset_path in enumerate(os.listdir(output_dir)):\n",
    "    print(f\"Uploading {idx+1}/{len(os.listdir(output_dir))} tar split\")\n",
    "    upload_dataset_message = subprocess.getoutput(f\"nvtl {model_name} dataset-upload --id {coco_dataset_id} --path {os.path.join(output_dir,tar_dataset_path)}\")\n",
    "    print(upload_dataset_message)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### List the datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "message = subprocess.getoutput(f\"nvtl {model_name} list-datasets\")\n",
    "message = ast.literal_eval(message)\n",
    "for rsp in message:\n",
    "    rsp_keys = rsp.keys()\n",
    "    assert \"id\" in rsp_keys\n",
    "    assert \"type\" in rsp_keys\n",
    "    assert \"format\" in rsp_keys\n",
    "    assert \"name\" in rsp_keys\n",
    "    print(rsp[\"id\"],\"\\t\",rsp[\"type\"],\"\\t\",rsp[\"format\"],\"\\t\\t\",rsp[\"name\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an experiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an experiment\n",
    "network_arch = model_name\n",
    "pseudo_mask_experiment_id = subprocess.getoutput(f\"nvtl {model_name} experiment-create --network_arch {network_arch} --encryption_key mvidia_tlt\")\n",
    "print(pseudo_mask_experiment_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Find the PTM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List all pretrained models for the chosen network architecture\n",
    "filter_params = {\"network_arch\": network_arch}\n",
    "message = subprocess.getoutput(f\"nvtl {model_name} list-experiments --filter_params '{json.dumps(filter_params)}'\")\n",
    "message = ast.literal_eval(message)\n",
    "for rsp in message:\n",
    "    rsp_keys = rsp.keys()\n",
    "    if \"encryption_key\" not in rsp.keys():\n",
    "        assert \"name\" in rsp_keys and \"version\" in rsp_keys and \"ngc_path\" in rsp_keys and \"additional_id_info\" in rsp_keys\n",
    "        print(f'PTM Name: {rsp[\"name\"]}; PTM version: {rsp[\"version\"]}; NGC PATH: {rsp[\"ngc_path\"]}; Additional info: {rsp[\"additional_id_info\"]}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pretrained_map = {\"auto_label\" : \"mask_auto_label:trainable_v1.0\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filter_params = {\"network_arch\": network_arch}\n",
    "message = subprocess.getoutput(f\"nvtl {model_name} list-experiments --filter_params '{json.dumps(filter_params)}'\")\n",
    "message = ast.literal_eval(message)\n",
    "ptm = []\n",
    "for rsp in message:\n",
    "    rsp_keys = rsp.keys()\n",
    "    assert \"ngc_path\" in rsp_keys\n",
    "    if rsp[\"ngc_path\"].endswith(pretrained_map[network_arch]):\n",
    "        assert \"id\" in rsp_keys\n",
    "        ptm_id = rsp[\"id\"]\n",
    "        ptm = [ptm_id]\n",
    "        print(\"Metadata for model with requested NGC Path\")\n",
    "        print(rsp)\n",
    "        break\n",
    "print(ptm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign the PTM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ptm_information = {\"base_experiment\":ptm}\n",
    "patched_model = subprocess.getoutput(f\"nvtl {model_name} patch-artifact-metadata --id {pseudo_mask_experiment_id} --job_type experiment --update_info '{json.dumps(ptm_information)}' \")\n",
    "print(patched_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assign dataset\n",
    "docker_env_vars = {} # Update any variables to be included while triggering Docker run-time like MLOPs variables \n",
    "dataset_information = {\"inference_dataset\":coco_dataset_id,\"docker_env_vars\": docker_env_vars}\n",
    "patched_model = subprocess.getoutput(f\"nvtl {model_name} patch-artifact-metadata --id {pseudo_mask_experiment_id} --job_type experiment --update_info '{json.dumps(dataset_information)}' \")\n",
    "print(patched_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set action specs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Default model specs\n",
    "auto_label_generate_specs = subprocess.getoutput(f\"nvtl {model_name} get-spec --action {action} --job_type experiment --id {pseudo_mask_experiment_id}\")\n",
    "auto_label_generate_specs = json.loads(auto_label_generate_specs)\n",
    "print(json.dumps(auto_label_generate_specs, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set specs\n",
    "auto_label_generate_specs[\"gpu_ids\"] = [0]\n",
    "print(json.dumps(auto_label_generate_specs, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run action\n",
    "label_job_id = subprocess.getoutput(f\"nvtl {model_name} experiment-run-action --id {pseudo_mask_experiment_id} --action {action} --specs '{json.dumps(auto_label_generate_specs)}'\")\n",
    "print(label_job_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)\n",
    "status = my_tail(model_name, pseudo_mask_experiment_id, label_job_id, \"experiment\", workdir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download the label masks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file_list = subprocess.getoutput(f\"nvtl {model_name} list-job-files --id {pseudo_mask_experiment_id} --job {label_job_id} --job_type experiment --retrieve_logs True --retrieve_specs False\")\n",
    "print(file_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "temptar = subprocess.getoutput(f\"nvtl {model_name} download-entire-job --id {pseudo_mask_experiment_id} --job {label_job_id} --job_type experiment --workdir {workdir}\")\n",
    "tar_command = f'tar -xvf {temptar} -C {workdir}/'\n",
    "os.system(tar_command)\n",
    "os.remove(temptar)\n",
    "print(f\"Results at {workdir}/{label_job_id}\")\n",
    "model_downloaded_path = f\"{workdir}/{label_job_id}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copy annotations to the dataset\n",
    "!cp {model_downloaded_path}/label.json {data_dir}/label.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Apply data augmentation <a class=\"anchor\" id=\"head-3\"></a>\n",
    "In this section, we run offline augmentation with the original dataset. During the augmentation process, we can use the pseudo-masks generated from the last step to refine the distorted or rotated bounding boxes.\n",
    "\n",
    "### Define the task and action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = \"augmentation\"\n",
    "action = \"generate\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the dataset\n",
    "We would be formatting the dataset to include the generated mask information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Format the dataset\n",
    "copy_command = f'cp {workdir}/{label_job_id}/label.json {workdir}/{convert_job_id}_coco'\n",
    "os.system(copy_command)\n",
    "\n",
    "# Tar the dataset\n",
    "tar_command = f'cd {workdir} && tar -cvf {label_job_id}_coco.tar {convert_job_id}_coco'\n",
    "os.system(tar_command)\n",
    "coco_auto_label_data_path = f'{workdir}/{label_job_id}_coco.tar'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create dataset\n",
    "coco_aug_dataset_id = subprocess.getoutput(f\"nvtl {model_name} dataset-create --dataset_type object_detection --dataset_format coco\")\n",
    "print(coco_aug_dataset_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_dir = os.path.join(os.path.dirname(os.path.abspath(coco_auto_label_data_path)), model_name, \"coco_auto_label\")\n",
    "split_tar_file(coco_auto_label_data_path, output_dir)\n",
    "for idx, tar_dataset_path in enumerate(os.listdir(output_dir)):\n",
    "    print(f\"Uploading {idx+1}/{len(os.listdir(output_dir))} tar split\")\n",
    "    upload_dataset_message = subprocess.getoutput(f\"nvtl {model_name} dataset-upload --id {coco_aug_dataset_id} --path {os.path.join(output_dir,tar_dataset_path)}\")\n",
    "    print(upload_dataset_message)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### List the datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "message = subprocess.getoutput(f\"nvtl {model_name} list-datasets\")\n",
    "message = ast.literal_eval(message)\n",
    "for rsp in message:\n",
    "    rsp_keys = rsp.keys()\n",
    "    assert \"id\" in rsp_keys\n",
    "    assert \"type\" in rsp_keys\n",
    "    assert \"format\" in rsp_keys\n",
    "    assert \"name\" in rsp_keys\n",
    "    print(rsp[\"id\"],\"\\t\",rsp[\"type\"],\"\\t\",rsp[\"format\"],\"\\t\\t\",rsp[\"name\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an experiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an experiment\n",
    "data_aug_experiment_id = subprocess.getoutput(f\"nvtl {model_name} experiment-create --network_arch {model_name} --encryption_key nvidia_tlt\")\n",
    "print(data_aug_experiment_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assign dataset\n",
    "docker_env_vars = {} # Update any variables to be included while triggering Docker run-time like MLOPs variables \n",
    "dataset_information = {\"inference_dataset\":coco_aug_dataset_id,\"docker_env_vars\": docker_env_vars}\n",
    "patched_model = subprocess.getoutput(f\"nvtl {model_name} patch-artifact-metadata --id {data_aug_experiment_id} --job_type experiment --update_info '{json.dumps(dataset_information)}' \")\n",
    "print(patched_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set action specs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Default model specs\n",
    "augmentation_generate_specs = subprocess.getoutput(f\"nvtl {model_name} get-spec --action {action} --job_type experiment --id {data_aug_experiment_id}\")\n",
    "augmentation_generate_specs = json.loads(augmentation_generate_specs)\n",
    "print(json.dumps(augmentation_generate_specs, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Change any spec key if required\n",
    "print(json.dumps(augmentation_generate_specs, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Execute the data augmentation action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run action\n",
    "augment_job_id = subprocess.getoutput(f\"nvtl {model_name} experiment-run-action --id {data_aug_experiment_id} --action {action} --specs '{json.dumps(augmentation_generate_specs)}'\")\n",
    "print(augment_job_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)\n",
    "status = my_tail(model_name, data_aug_experiment_id, augment_job_id, \"experiment\", workdir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Perform data analytics  <a class=\"anchor\" id=\"head-4\"></a>\n",
    "Next, we perform analytics with the KITTI dataset.\n",
    "\n",
    "### Assigning the task and action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = \"analytics\"\n",
    "action = \"analyze\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an experiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an experiment\n",
    "data_analytics_experiment_id = subprocess.getoutput(f\"nvtl {model_name} experiment-create --network_arch {model_name} --encryption_key nvidia_tlt\")\n",
    "print(data_analytics_experiment_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assign dataset\n",
    "docker_env_vars = {} # Update any variables to be included while triggering Docker run-time like MLOPs variables \n",
    "dataset_information = {\"inference_dataset\":kitti_dataset_id,\"docker_env_vars\": docker_env_vars}\n",
    "patched_model = subprocess.getoutput(f\"nvtl {model_name} patch-artifact-metadata --id {data_analytics_experiment_id} --job_type experiment --update_info '{json.dumps(dataset_information)}' \")\n",
    "print(patched_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set action specs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Default model specs\n",
    "analytics_analyze_specs = subprocess.getoutput(f\"nvtl {model_name} get-spec --action {action} --job_type experiment --id {data_analytics_experiment_id}\")\n",
    "analytics_analyze_specs = json.loads(analytics_analyze_specs)\n",
    "print(json.dumps(analytics_analyze_specs, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Change any spec key if required\n",
    "print(json.dumps(analytics_analyze_specs, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Execute the data analytics action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run action\n",
    "analyze_job_id = subprocess.getoutput(f\"nvtl {model_name} experiment-run-action --id {data_analytics_experiment_id} --action {action} --specs '{json.dumps(analytics_analyze_specs)}'\")\n",
    "print(analyze_job_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)\n",
    "status = my_tail(model_name, data_analytics_experiment_id, analyze_job_id, \"experiment\", workdir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Delete model <a class=\"anchor\" id=\"head-21\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subprocess.getoutput(f\"nvtl {model_name} experiment-delete --id {annotation_conversion_experiment_id}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subprocess.getoutput(f\"nvtl {model_name} experiment-delete --id {pseudo_mask_experiment_id}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subprocess.getoutput(f\"nvtl {model_name} experiment-delete --id {data_aug_experiment_id}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subprocess.getoutput(f\"nvtl {model_name} experiment-delete --id {data_analytics_experiment_id}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Delete dataset <a class=\"anchor\" id=\"head-21\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Delete kitti dataset <a class=\"anchor\" id=\"head-21\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subprocess.getoutput(f\"nvtl {model_name} dataset-delete --id {kitti_dataset_id}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Delete coco dataset <a class=\"anchor\" id=\"head-21\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subprocess.getoutput(f\"nvtl {model_name} dataset-delete --id {coco_dataset_id}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Delete coco aug dataset <a class=\"anchor\" id=\"head-21\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subprocess.getoutput(f\"nvtl {model_name} dataset-delete --id {coco_aug_dataset_id}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
